MedEval-six test collections in one

نویسنده

  • Karin Friberg
چکیده

Information retrieval is a field of research reaching from computer and information science to lingusitics. As a linguist in the information retrieval field, I leave the quest for effective search engines and evaluation models to others, and focus on language aspects. Words, and parts of words such as compound constituents, which are successful in queries, what features do they have in common? Does the domain of search terms have impact in a domain specific environment? Can search terms with certain features help users of different categories find documents suited for them? This paper describes the making of an information retrieval test collection which made it possible to study these questions. The test collection will be used to Evaluate search strategies to retrieve Medical documents, hence the name. To study language aspects of information retrieval a new test collection was called for, a collection which was domain specific, which regarded user groups, and which had double indexes Table 1: The genres of the MedEval document sources. (D. Kokkinakis, p.c.) Type of source Number of Percent of Number Percent documents documents of tokens of tokens Journals and periodicals 8 453 20.0 5.3 million 34.6 Specialized sites 14 631 34.6 2.9 million 19.1 Pharmaceutical companies 9 200 21.8 2.3 million 14.8 Faculties, institutes, hospitals and government 2 955 7.0 2.0 million 13.3 Health-care communication companies 4 036 9.6 1.7 million 11.3 Media (TV, daily newspapers) 2 980 7.1 1.0 million 6.9 Total 42 255 100.1 15.2 million 100 for split and unsplit compounds. Since there was no such collection we built MedEval, a Swedish medical test collection, with documents marked for target groups, professionals and laypersons, with a system allowing choice of user group, and with two indexes, treating compounds in different ways. In accordance with the Cranfield Paradigm the MedEval test collection is based on three parts: A set of documents, a set of topics, and a set of known relevant documents with respect to each of the topics (Cleverdon, 1967). 1 The Document Collection The MedEval test collection is built on documents from the MedLex corpus (Kokkinakis, 2004). MedLex consists of scientific articles from medical journals, teaching material, guidelines, patient FAQs, health care information, etc. The set of documents used in MedEval is a snapshot of MedLex in October 2007, approximately 42 200 documents or 15.2 million tokens. See Table 1. For the MedEval test collection the documents are stored in the trectext format. The documents have IDs that reveal the source, and they are tokenized and tagged. Kristiina Jokinen and Eckhard Bick (Eds.) NODALIDA 2009 Conference Proceedings, pp. 223–226

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MedEval — A Swedish medical test collection with doctors and patients user groups

BACKGROUND Test collections for information retrieval are scarce. Domain specific test collections even more so, and medical test collections in the Swedish language non-existent prior to the making of the MedEval test collection. Most research in information retrieval has been performed in the English language, thus most test collections contain English documents. However, English is morpholog...

متن کامل

MedEval- A Swedish Medical Test Collection with Doctors and Patients User Groups

MedEval is a Swedish medical test collection where assessments have been made, not only for topical relevance, but also for target reader group: Doctors or Patients. The user of the test collection can choose if s/he wishes to search in the Doctors or the Patients scenarios where the topical relevance assessments have been adjusted with consideration to user group, or to search in a scenario wh...

متن کامل

PERCUTANEOUS DRAINAGE OF ABDOMINAL ABSCESSES AND FLUID COLLECTIONS

This report summarizes the results of 64 percutaneous catheter drainage of abdominal abscesses and fluid collections in 56 patients. Aspiration and drainage was guided with computed tomography in 34 patients and with ultrasound in 30 patients. Success rate was 90%. Infected collections were successfully drained in 94% and noninfected collections in 72% . Partial success was achieved in two...

متن کامل

Genetic relationships among collections of the Persian sturgeon, Acipenser percicus, in the south Caspian Sea detected by mitochondrial DNA Restriction fragment length polymorphisms

In the present study, mitochondrial DNA polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) assay was used to assess the population structure and genetic relationships among six Persian sturgeon, Acipenser persicus populations from south Caspian Sea along the Iranian coast. The complete nucleotide dehydrogenase subunit 5 (NADH 5) region of mtDNA amplified by PCR was di...

متن کامل

مستند نگاری ظروف سفالی گِلابه ای منقوش رنگارنگ روی زمینه سفید(موزه بنیاد تهران)

Cultural Institute of Tehran’s Bonyad Museums includes the most exquisite collections in separate sections. The pottery vault is one part of this museum. The mentioned collection is one of the most unique collections existing in the country. There are many pottery wares in this collection, whose production centers and dates are not clearly known. In the present research, six samples of polychro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009